智能论文笔记

Out-of-Distribution Detection with Reconstruction Error and Typicality-based Penalty

Genki Osada , Takahashi Tsubasa , Budrul Ahsan , Takashi Nishide

分类：机器学习 | 计算机视觉

2022-12-24

The task of out-of-distribution (OOD) detection is vital to realize safe and reliable operation for real-world applications. After the failure of likelihood-based detection in high dimensions had been shown, approaches based on the \emph{typical set} have been attracting attention; however, they still have not achieved satisfactory performance. Beginning by presenting the failure case of the typicality-based approach, we propose a new reconstruction error-based approach that employs normalizing flow (NF). We further introduce a typicality-based penalty, and by incorporating it into the reconstruction error in NF, we propose a new OOD detection method, penalized reconstruction error (PRE). Because the PRE detects test inputs that lie off the in-distribution manifold, it effectively detects adversarial examples as well as OOD examples. We show the effectiveness of our method through the evaluation using natural image datasets, CIFAR-10, TinyImageNet, and ILSVRC2012.

translated by 谷歌翻译

Scaling Private Deep Learning with Low-Rank and Sparse Gradients

Ryuichi Ito , Seng Pei Liew , Tsubasa Takahashi , Yuya Sasaki , Makoto Onizuka

分类：机器学习

2022-07-06

将差异化随机梯度下降（DPSGD）应用于培训现代大规模神经网络（例如基于变压器的模型）是一项艰巨的任务，因为在每个迭代尺度上添加了噪声的幅度，都具有模型维度，从而阻碍了学习能力显著地。我们提出了一个统一的框架，即$ \ textsf {lsg} $，该框架充分利用了神经网络的低级别和稀疏结构，以减少梯度更新的维度，从而减轻DPSGD的负面影响。首先使用一对低级矩阵近似梯度更新。然后，一种新颖的策略用于稀疏梯度，从而导致低维，较少的嘈杂更新，这些更新尚未保留神经网络的性能。关于自然语言处理和计算机视觉任务的经验评估表明，我们的方法的表现优于其他最先进的基线。

translated by 谷歌翻译

Shuffle Gaussian Mechanism for Differential Privacy

Seng Pei Liew , Tsubasa Takahashi

分类：机器学习 | (统计)机器学习

2022-06-20

我们在差异隐私（DP）的洗牌模型中研究高斯机制。特别是，我们表征了该机制的r \'enyi差异隐私（RDP），表明它是形式：$$ \ epsilon（\ lambda）\ leq \ leq \ frac {1} {\ lambda-rambda-1} \ log \ left（ \ frac { } \ binom {\ lambda！} {k_1，\ dotsc，k_n} e^{\ sum_ {\ sum_ {i = 1}^nk_i^2/2 \ sigma^2} \ right）由高斯RDP限制在上面，而不会改组。混乱的高斯RDP在组成多种DP机制方面是有利的，在该机制中，我们证明了其对散装模型的隐私保证的最新近似DP组成定理的改进。此外，我们将研究扩展到了次采样的洗牌机制和最近提出的洗牌机制，这些机制是针对分布式/联合学习的协议。最后，对这些机制进行了一项实证研究，以证明在分布式学习框架下采用洗牌高斯机制来保证严格的用户隐私的功效。

translated by 谷歌翻译

Shuffled Check-in: Privacy Amplification towards Practical Distributed Learning

Seng Pei Liew , Satoshi Hasegawa , Tsubasa Takahashi

分类：机器学习

2022-06-07

最近对具有正式隐私保证的分布式计算的研究，例如联合学习的差异私有（DP），利用每回合中客户的随机抽样（通过亚采样进行的隐私放大）来达到令人满意的隐私水平。然而，实现这一目标需要强大的假设，这些假设可能无法实践，包括对客户的精确和统一的亚采样，以及高度信任的聚合器来处理客户的数据。在本文中，我们探讨了一个更实用的协议，改组了办理登机手续，以解决上述问题。该协议依靠客户端做出独立和随机的决定来参与计算，释放服务器发射的亚采样要求，并启用客户端辍学的强大建模。此外，采用了称为洗牌模型的较弱的信任模型，而不是使用受信任的聚合器。为此，我们介绍了新工具来表征洗牌的r \'enyi差异隐私（RDP）。我们表明，我们的新技术在隐私保证中至少提高了三次，而在各种参数制度下使用近似DP的强大组成的人进行了三倍。此外，我们提供了一种数值方法来跟踪通用洗牌机构的隐私，包括具有高斯机制的分布式随机梯度下降（SGD）。据我们所知，这也是文献中分布式设置下本地/洗牌模型中高斯机制的首次评估，这可能具有独立的兴趣。

translated by 谷歌翻译

Point Cloud-based Proactive Link Quality Prediction for Millimeter-wave Communications

Shoki Ohta , Takayuki Nishio , Riichi Kudo , Kahoko Takahashi , Hisashi Nagata

分类：人工智能 | 计算机视觉 | 机器学习

2023-01-02

This study demonstrates the feasibility of point cloud-based proactive link quality prediction for millimeter-wave (mmWave) communications. Image-based methods to quantitatively and deterministically predict future received signal strength using machine learning from time series of depth images to mitigate the human body line-of-sight (LOS) path blockage in mmWave communications have been proposed. However, image-based methods have been limited in applicable environments because camera images may contain private information. Thus, this study demonstrates the feasibility of using point clouds obtained from light detection and ranging (LiDAR) for the mmWave link quality prediction. Point clouds represent three-dimensional (3D) spaces as a set of points and are sparser and less likely to contain sensitive information than camera images. Additionally, point clouds provide 3D position and motion information, which is necessary for understanding the radio propagation environment involving pedestrians. This study designs the mmWave link quality prediction method and conducts two experimental evaluations using different types of point clouds obtained from LiDAR and depth cameras, as well as different numerical indicators of link quality, received signal strength and throughput. Based on these experiments, our proposed method can predict future large attenuation of mmWave link quality due to LOS blockage by human bodies, therefore our point cloud-based method can be an alternative to image-based methods.

translated by 谷歌翻译

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Hao-Wen Dong , Naoya Takahashi , Yuki Mitsufuji , Julian McAuley , Taylor Berg-Kirkpatrick

分类：计算机视觉

2022-12-14

Recent years have seen progress beyond domain-specific sound separation for speech or music towards universal sound separation for arbitrary sounds. Prior work on universal sound separation has investigated separating a target sound out of an audio mixture given a text query. Such text-queried sound separation systems provide a natural and scalable interface for specifying arbitrary target sounds. However, supervised text-queried sound separation systems require costly labeled audio-text pairs for training. Moreover, the audio provided in existing datasets is often recorded in a controlled environment, causing a considerable generalization gap to noisy audio in the wild. In this work, we aim to approach text-queried universal sound separation by using only unlabeled data. We propose to leverage the visual modality as a bridge to learn the desired audio-textual correspondence. The proposed CLIPSep model first encodes the input query into a query vector using the contrastive language-image pretraining (CLIP) model, and the query vector is then used to condition an audio separation model to separate out the target sound. While the model is trained on image-audio pairs extracted from unlabeled videos, at test time we can instead query the model with text inputs in a zero-shot setting, thanks to the joint language-image embedding learned by the CLIP model. Further, videos in the wild often contain off-screen sounds and background noise that may hinder the model from learning the desired audio-textual correspondence. To address this problem, we further propose an approach called noise invariant training for training a query-based sound separation model on noisy data. Experimental results show that the proposed models successfully learn text-queried universal sound separation using only noisy unlabeled videos, even achieving competitive performance against a supervised model in some settings.

translated by 谷歌翻译

Data Augmentation by Selecting Mixed Classes Considering Distance Between Classes

Shungo Fujii , Yasunori Ishii , Kazuki Kozuka , Tsubasa Hirakawa , Takayoshi Yamashita , Hironobu Fujiyoshi

分类：计算机视觉 | (统计)机器学习

2022-09-12

数据增强是使用深度学习来提高对象识别的识别精度的重要技术。从多个数据集中产生混合数据（例如混音）的方法可以获取未包含在培训数据中的新多样性，从而有助于改善准确性。但是，由于在整个训练过程中选择了选择用于混合的数据，因此在某些情况下未选择适当的类或数据。在这项研究中，我们提出了一种数据增强方法，该方法根据班级概率来计算类之间的距离，并可以从合适的类中选择数据以在培训过程中混合。根据每个班级的训练趋势，对混合数据进行动态调整，以促进培训。所提出的方法与常规方法结合使用，以生成混合数据。评估实验表明，提出的方法改善了对一般和长尾图像识别数据集的识别性能。

translated by 谷歌翻译

Expressions Causing Differences in Emotion Recognition in Social Networking Service Documents

Tsubasa Nakagawa , Shunsuke Kitada , Hitoshi Iyatomi

分类：自然语言处理 | 人工智能 | 机器学习

2022-08-30

通常很难从网上交换的文本中正确推断作家的情绪，而作家和读者之间的认可差异可能会出现问题。在本文中，我们提出了一个新的框架，用于检测句子，以在作者和读者之间在情感识别上产生差异，并检测引起这种差异的表达方式。所提出的框架由基于变压器（BERT）的检测器的双向编码器表示，该表示器检测句子，导致情绪识别差异，并分析获得在此类句子中特征性出现的表达式。该探测器基于由作者和社交网络服务（SNS）文档的三个读者注释的日本SNS文档数据集，并以AUC = 0.772检测到“隐藏的天角句子”；这些句子引起了人们对愤怒的认识的差异。由于SNS文档包含许多句子，这些句子的含义很难通过分析该检测器检测到的句子来解释，因此我们获得了几种表达式，这些表达式在隐藏的角度句子中出现。被发现的句子和表情并不能明确传达愤怒，很难推断作家的愤怒，但是如果指出了隐性的愤怒，就有可能猜测作者为什么生气。在实际使用中，该框架很可能有能力根据误解来缓解问题。

translated by 谷歌翻译

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Shrutina Agarwal , Sriram Ganapathy , Naoya Takahashi

分类：机器学习

2022-08-26

在本文中，我们提出了一个模型，以执行语音转换为歌声。与以前的基于信号处理的方法相反，基于信号处理的方法需要高质量的唱歌模板或音素同步，我们探索了一种数据驱动的方法，即将自然语音转换为唱歌声音的问题。我们开发了一种新型的神经网络体系结构，称为Symnet，该结构将输入语音与目标旋律的一致性建模，同时保留了说话者的身份和自然性。所提出的符号模型由三种类型层的对称堆栈组成：卷积，变压器和自发层。本文还探讨了新的数据增强和生成损耗退火方法，以促进模型培训。实验是在NUS和NHSS数据集上进行的，这些数据集由语音和唱歌语音的平行数据组成。在这些实验中，我们表明所提出的SYMNET模型在先前发表的方法和基线体系结构上显着提高了客观重建质量。此外，主观听力测试证实了使用拟议方法获得的音频质量的提高（绝对提高了0.37的平均意见分数测度量度比基线系统）。

translated by 谷歌翻译

HTML版本

Visual Explanation of Deep Q-Network for Robot Navigation by Fine-tuning Attention Branch

Yuya Maruyama , Hiroshi Fukui , Tsubasa Hirakawa , Takayoshi Yamashita , Hironobu Fujiyoshi , Komei Sugiura

分类：机器人

2022-08-18

机器人进行深入增强学习（RL）的导航，在复杂的环境下实现了更高的性能，并且表现良好。同时，对深度RL模型的决策的解释成为更多自主机器人安全性和可靠性的关键问题。在本文中，我们提出了一种基于深入RL模型的注意力分支的视觉解释方法。我们将注意力分支与预先训练的深度RL模型联系起来，并通过以监督的学习方式使用受过训练的深度RL模型作为正确标签来训练注意力分支。由于注意力分支经过训练以输出与深RL模型相同的结果，因此获得的注意图与具有更高可解释性的代理作用相对应。机器人导航任务的实验结果表明，所提出的方法可以生成可解释的注意图以进行视觉解释。

translated by 谷歌翻译